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Preface 


The  linear  regression  model  is  one  of  the  most  widely  used  quantit¬ 
ative  tools  of  the  applied  social  sciences  and  many  of  the  physical 
sciences.  The  most  common  used  techniques  in  this  kind  of  model,  is  the 
ordinary  least  squares  because  of  its  low  computational  costs,  its  in¬ 
tuitive  plausibility  in  a  wide  variety  of  circumstances,  and  its  support 
by  a  broad  and  sophisticated  body  of  statistical  inference.  The  least 
squares  tool  could  be  used  on  3-basic  levels: 

» 

1.  It  can  be  applied  mechanically,  or  descriptively,  as  a  means  of 
curve  fitting. 

2.  It  enables  us  to  perform  hypothesis  testing. 

3.  It  gives  a  reasonable  way  of  understanding  complex  physical 
and  social  phenomena. 

Let  us  now  denote  the  regression  model  by 

Y  =  X  8  +  e  ,  where 

\ 

X-isanNxk+1  matrix, 

B-isank+lxl  vector, 
c  -  is  an  n  x  1  vector,  and 
Y  -  is  an  N  x  1  vector. 

The  assumptions  for  the  least  square  method  are: 

1.  E(e )  =  0  i.e. 

The  expected  value  of  the  error  term  £  is  zero 

2.  E(c  -  E(c))2  =  o2 I:  i.e. 

2 

All  error  terms  have  Constance  variance  o  and  they  are  independent. 

3.  The  X  matrix  is  nonstochastic  with  rank  p(x)  =  k  +  1  i.e. 
none  of  the  columns  of  x  is  a  linear  combination  of  other  columns. 


IV 


The  estimators  for  the  coefficients  vector  8  which  are  given  as  £: 

/v  ,  *  V_1  ' 

8  =  (X  X)  X  Y 

These  estimators  have  the  properties: 

1.  8  is  a  linear  function  of  Y. 

2.  E(b)  =  8  i.e.  unbiasedness 

3.  V(8)  =  E(|  -  E(|)  (6  -  E(|) '  =  a2  (x'x)"1 

2  2 

and  the  estimate  for  a  is  given  by  S  where: 

~2  Error  sum  of  squares 

S  -  jj_(k  +  1} 

4.  The  basic  and  most  important  assumption  for  that  model  is  the  assump¬ 
tion  of  normality.  The  confidence  interval  and  testing  procedures  are 
all  based  on  the  normality  assumption.  It  is  true  that  normality  assump¬ 
tion  is  an  important  case  and  that  it  can  sometimes  be  justified  by  the 
central  limit  theorem,  but  it  is  equally  true  that  the  assumption  is  made 
in  many  cases  in  which  it  does  not  really  hold.  There  are  two  basic 
questions  arising  in  th^e  case: 

1)  How  serious  are  the  consequences? 

2)  To  what  extent  is  a  test  "robust”? 

i.e.  To  what  extent  is  a  test  insensitive  to  departures  from  the  assump¬ 
tion  under  which  it  is  derived? 

In  that  concern  appears  two  basic  issues:  -  First:  Tests  which  con¬ 
cern  first  moments  (such  as  t-tests  for  elements  of  the  parameter  vector 
8  of  the  expectation  X8  in  the  standard  linear  model ,  are  relatively  in¬ 
sensitive  to  departures  from  normality. 

Second:  Tests  concerning  second  moments  such  as  F-tests  are  much 
less  robust  (see  Kendall  and  Stuart  1967,  pp.  455).  Thus  our  search  here 
will  be  bascially  for  a  robust  technique  that  could  be  applied  for  estimat¬ 
ing  parameters  of  the  linear  model  Y  =  X6  +  e 
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Foreward 


During  the  course  "Linear  Statistical  Models"  given  in  AFIT,  I 
started  to  be  interested  in  the  regression  models  due  to  their  wide  use 
in  management,  management  sciences,  and  social  sciences.  These  models 
are  successfully  used  in  real  life  applications  basically  because  of 
the  sound  understanding  of  both  the  underlying  theory  and  the  practical 
applications  themselves. 

Robust  linear  regression  model  is  an  area  of  greater  interest  since 
in  many  sets  of  data,  there  .are  fairly  large  percentages  of  "Outliers" 
due  to  heavy  tailed  models  of  errors  in  collecting  and  recording.  Due 
to  the  fact  that  these  outliers  have  an  unusually  great  influence  on 
"least  squares"  estimators  (or  generalized  least  square  estimators), 
robust  procedure  attempts  to  modify  those  schemes.  During  a  course  by 
Dr.  A.  H.  Moore,  Professor  in  the  Department  of  Mathematics,  Air  Force 
Institute  of  Technology,  School  of  Engineering  in  robust  statistics,  I 
became  interested  in  the  area  of  robust  regression.  After  talking  to 
Dr.  Moore  about  my  interest  in  robust  regression,  we  decided  to  make 
a  search  in  robust  multiple  linear  regression. 

I  wish  to  express  my  thanks  to  Dr.  A.  H.  Moore,  my  thesis  advisor, 
for  his  valuable  remarks,  directions  for  search  and  his  aid  in  the 
accomplishment  of  my  thesis.  I  also  wish  to  thank  Dr.  J.  P.  Cain,  my 
reader,  to  whom  I  am  especially  indebted  for  learning  a  lot  about 
multiple  linear  regression. 

Finally,  I  owe  my  wife,  Azza,  my  son  Mohamed  and  my  lovely  American 
daughter,  Dina,  a  great  debt  of  love  and  best  wishes  for  their  patience 
and  encouragement  during  my  study  at  AFIT. 


AHMED  MOHAMED  M.  SULTAN 
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Abstract 


v 

An  extensive  Monte  Carlo  analysis  is  conducted  to  determine  the 
performance  of  robust  linear  regression  techniques  with  and  without 
outliers.  Thirteen  methods  of  regression  are  compared  including  least 
squares  and  minimum  absolute  deviation.  The  classical  robust  techniques 
of  Huber,  Hampel  were  studied  and  robust  techniques  using  the  Q-statistic 
as  a  discriminant  were  introduced. 

The  model  studied  contained  eleven  variables  with  27  observations. 
The  error  distributions  considered  were  uniformly  normally,  double 
exponentially  distributed. 

Least  squares  gave  the  best  fit  without  outliers.  In  the  presence 
of  gross  outliers  a  rejection  of  outliers  technique  gave  the  best  fit. 

/!'. 
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I.  Introduction 


Problem  Statement 

Regression  analysis  is  a  statistical  technique  for  expressing  the 
relationship  between  variables  in  a  mathematical  form.  Moreover  it  is 
considered  one  of  the  most  widely  used  statistical  techniques  due  to  its 
large  applications  in  almost  every  field.  An  earlier  search  has  been 
done  by  James  E.  Flanagan  G0R/81-D  to  examine  the  use  of  Lp-norms  and 
distance  estimation.  Due  to  computer  and  algorithm  limitations  it  was 
only  possible  to  examine  the  following  linear  models: 

y  =  60  +.  61  Xx  +  e 

and 

y  =  Bo  +  B1  xi  +  B2  x2  +  E 

However,  the  application  envisioned  is  to  try  to  improve  the  "pre¬ 
dictive"  operations  and  maintenance  cost  model  (ALPOS  model)  developed 
for  Air  Force  Avionics  Laboratory  Systems  Evaluation  group.  However, 
their  linear  model  used  20  independent  variables.  The  earlier  search  . 
demonstrated  the  feasibility  of  a  generalized  approach  to  the  regression 
problem  but  was  unable  to  handle  many  independent  variables. 

This  thesis  envisions  using  a  different  approach  (Adaptive)  so  that 
many  independent  variables  (up  to  100)  can  generally  be  handled. 

Verification  of  the  model  can  be  made  by  comparing  its  prediction 
capability  with  the  prediction  capability  of  the  ALPOS  model. 

Review  of  Applicable  Literature 

The  possible  existence  of  non-normal  error  distribution  having 
infinite  variance  or  with  large  tails,  has  led  the  statistician  to  a 
search  for  estimators  that  are  more  "robust"  than  least  squares  (L.S.) 


estimators.  By  "robust"  here  one  means  a  reasonably  efficient  estimator 
regardless  of  the  form  of  the  underlying  error  distribution.  When  the 
errors  are  i.i.d.,  normal  random  variables,  L.S.  estimator  are  efficient, 
and  so,  the  search  is  for  estimators  that  are  not  much  worse  than  L.S. 
when  the  errors  are  normally  distributed  but  are  really  better  for  non¬ 
normal  errors. 

A  large  number  of  estimators,  were  suggested  in  a  considerable  body 
of  literature.  For  example,  the  surveys  of  Huber  (Ref  43:1041)  in  which 
a  selective  review  on  robust  statistics,  centering  on  estimates  of  loca¬ 
tion  and  extending  into  other  estimation  and  testing  problems.  In  1973 
Huber  (Ref  45:799)  defined  the  maximum  likelihood  type  robust  estimates 
of  regression,  and  investigated  their  asymptotic  properties  both  theor- 
tically  and  empirically.  Koenker  and  Bassett  (Ref  55:33)  introduced  a 
new  class  of  linear  model  called  "regression  quantiles",  which  is  a 
simple  minimization  problem  yielding  the  ordinary  sample  quantiles  in 
location  model.  This  model  generalizes  naturally  to  the  linear  model. 

The  estimator  which  minimizes  the  sum  of  absolute  residuals  is  an  import¬ 
ant  case.  Estimators  were  suggested,  which  have  comparable  efficiency 
to  least  squares  for  normal  linear  model  while  substantially  out-perform¬ 
ing  the  least  squares  estimator  over  a  wide  class  of  non-normal  error 
distributions.  Another  study  was  made  by  McKean,  J  and  Hettmansperger, 
Thomas  for  the  general  linear  model  based  on  one  step  R-estimates 
(Ref  60:571).  One  step  iterations  based  on  a  second  derivative  approxima¬ 
tion  to  the  surface  was  proposed.  These  estimates,  can  be  obtained  quick¬ 
ly  from  initial  estimates.  Further  the  analysis  resulting  from  these 
estimates  is  asymptotically  equivalent  to  the  minimum  dispersion  analysis. 
Thus  it  can  be  recommended  for  large  data  sets.  In  addition  Maddala 
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(Ref  57:308)  surveyed  the  work  done  by  Huber  and  Anscombe  for  minimizing 

£  <  <*1  -  i  xik  Bk> 

with  different  definition  of  f  for  each  of  them.  Then  a  discussion  of 


least  absolute  deviation  minimization  was  discussed.  Also  a  relevant 
part  of  Mosteller  (Ref  64:105)  discussed  different  suggestions  for  solu¬ 
tion  of  non-normal  error  linear  models.  Finally,  Narula  (Ref  66:185) 
suggested  the  minimization  of  the  sum  of  relative  errors  (MSRE)  as  an 
alternative  to  least  squares.  The  problem  is  formulated  as  a  linear 
programming  problem  and  a  solution  procedure  is  given. 


Model  Selected 


The  model  selected  is 


y  -  eo  +  6i  xi  + 


--  +  B  Xn  +  e 


for  the  problem  of  property  valuation.  The  objective  is  to  predict  y, 
the  sale  price  of  a  home  for  known  value,  of  the  variables  through 

X1X  which  represent  (taxes,  number  of  baths,  lot  size,  -  ,  lot  size, 

number  of  fireplaces).  The  data,  27  observations  on  variables  (y^  X^ , 
-—  X^)  were  obtained  from  Multiple  Listing,  Vol .  87  for  area  12  (Erie, 


Choice  of  Error  Models 

In  order  to  see  the  behavior  of  the  proposed  adaptive  technique,  it 

was  necessary  to  add  different  error  distributions  to  an  exact  fit  of 

data.  The  way  it  is  done  here  is  through  getting  an  estimation  for  the 
/\ 

value  of  8  as  8Q  and  generating  exact  values  for  the  y  by  multiplying 
X  by  8 • : 


y  =  x  eo 

The  choice  of  non-normal  error  distribution  is  basically  dependent  on  the 
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tail  length  of  the  distribution.  For  the  uniform  case,  it  has  smaller 
tails,  while  for  the  double  exponential  it  has  thicker  tails  relative 
to  the  normal  distribution. 


C* 
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II.  Methods  of  Estimation 


As  in  the  general  decision  problem,  there  is  no  single,  best  pro¬ 
cedure  for  estimating  the  parameters  of  a  distribution.  In  a  given  case 
under  study,  it  may  be  advisable  to  use  the  method  of  moments,  Bayes 
estimates,  minimax  estimates,  or  maximum  likelihood  estimates. 

Methods  of  Moments 

This  method  is  oldest  method  of  estimating  parameters,  which  was 
devised  by  K.  Pearson  about  1894.  If  there  are  K  parameters  to  be 
estimated,  the  method  consists  of  expressing  the  first  K  population  mom¬ 
ents  in  terms  of  these  K-parameters ,  equating  them  to  the  corresponding 
sample  moments  and  taking  the  solutions  of  the  resulting  equations  as 
estimates  of  the  parameters.  The  method  usually  leads  to  relatively 
simple  estimates. 

The  estimates  obtained  in  this  way  are  clearly  functions  of  the 
sample  moments.  Since  the  sample  moments  are  consistent  estimates  of 
population  moments,  the  parameter  estimates  will  generally  be  consistent. 

Although  the  asymptotic  efficiency  of  estimates  obtained  by  the 
method  of  moments  is  often  less  than  1,  such  estimates  may  conveniently 
be  used  as  first  approximation  from  which  more  efficient  estimates  may 
be  obtained  by  other  means . 

Bayes  Estimates 

In  the  methods  of  point  estimation  the  assumption  is  that  the  ran¬ 
dom  sample  came  from  density  f{.;$),  where  the  function  f(.;$)  is  assumed 
to  be  known.  Moreover  $  was  some  fixed,  though  unknown,  point.  In  some 
real  world  situations  which  the  density  f(.;$)  represents,  there  is  often 
additional  information  about  $.i.e.  itself  may  act  as  a  random  variable 


for  which  one  could  postulate  a  realistic  density  function. 

It  has  been  seen  that  the  Bayes  action  for  a  given  observation  Z  =  z 

is  that  which  minimizes  the  expected  value  of  the  loss  with  respect  to 

the  posterior  distribution.  This  expected  loss,  assuming  a  quadratic 

2 

loss  function  ($  -  a)  ,  is 

00 

Eh  (*  -  a)2  =  J*  ($  -  a)2dH(4>) 

—CO 

where  H ( 4> )  is  the  distribution  fyrtction  for  the  posterior  distribution. 
Since  this  expected  loss  is  a  second  moment  of  a  distribution,  it  is 
minimized  when  taken  about  the  mean  of  the  distribution.  That  is,  the 
minimizing  action  and  hence  the  Bayes  estimate  of  4>  is 

EH  =  f  > 

Maximum  Likelihood  Estimates 

We  shall  suppose  first  that  the  population  of  interest  is  discrete, 
so  that  it  is  meaningful  to  speak  of  the  probability  that  X  =  x>  where 
X  denotes  a  sample  (Xlf...,Xn)  and  x  a  possible  realization  (x1,.».»xn)* 
This  probability  that  X  =  x  depends  cn  X,  of  qourse,  but  it  also  depends 
on  the  state  of  nature  which  governs.  As  a  function  of  <f  for  given 
X,  it  is  called  the  likelihood  function. 

L($)  =  P^(X  =  x) 

Thinking  of  a  state  of  nature  as  a  possible  "explanation"  of  ob¬ 
served  data,  the  maximum  likelihood  considers  the  "best"  explanation  to 

A 

be  the  state  of  nature  «>  that  maximizes  the  likelihood  function  -  that 
maximizes  the  probability  of  getting  what  was  actually  observed.  A  max¬ 
imum  likelihood  procedure  is  then  one  that  is  best  when  the  state  of  nat- 

/v 

vu?e  is  the  maximum  likelihood  state,  ♦.  This  is  determined  from  the  loss 
function  as  the  action  that  minimizes  the  loss  function  as  a  function  of 
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$  and  a  (i.e.  the  loss  resulting  from  an  action  o  when  the  state  of 

A 

nature  is  take  as  $). 

The  best,  explanation  $  of  a  given  observation  X  =  x  depends  on  x» 
and  so  defines  a  function  of  x  or  a  statistic.  The  rule  that  says  take 

/s 

the  action  that  minimizes  i(4>,  a),  where  i  is  the  loss  function,  assigns 

A 

this  action  to  the  x  that  leads  to  <fr,  and  so  the  maximum  likelihood 
principle  defines  a  decision  function,  called  the  maximum  likelihood 
decision  function. 

Thus  a  maximum  likelihood  estimate  is  a  value  of  that  maximizes 

A 

the  likelihood  function.  If  t  is  multidimensional,  so  is  $,  and  the 
components  are  said  to  be  joint  maximum  likelihood  estimates  of  the 
corresponding  components  of  ♦. 

Some  Other  Techniques 

A  brief  mention  will  be  made  in  this  port  of  certain  other  techni¬ 
ques  for  obtaining  estimators  involving  somewhat  more  mathematical  pre¬ 
paration  than  has  been  provided  or  assumed.  As  in  general,  a  decision 
procedure  can  be  replaced  by  one  based  on  a  sufficient  statistic,  so  in 
estimating  a  parameter  an  estimator  can  be  replaced  by  a  function  of  a 
sufficient  statistic  without  deterioration  of  the  risk.  In  particular, 
given  an  unbiased  estimate  U  of  the  parameter  h($),  an  unbiased  estimate 
based  on  the  sufficient  statistic  T  can  be  constructed  whose  variance  is 
not  greater  than  that  of  U.  In  some  instances  the  method  yields  an  un¬ 
biased  estimate  of  minimum  variance. 

Given  the  statistic  U,  then,  consider  the  function 

g  (t)  =  E  (U|T  =  t) 

If  T  is  sufficient,  the  conditional  distribution  of  X,  and  therefore  that 
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of  the  statistic  U,  are  independent  of  the  state  ♦  .  The  function  g(t) 
really  depends,  then,  only  on  t,  as  the  notation  implies.  It  defines  a 
statistic 

V  =  g(T), 

Whose  mean  is  the  same  as  that  of  U: 

E(V)  =  E(E(U|T) ) 

=  E(U) 

Consequently,  if  U  is  an  unbiased  estimate  of  h($),  so  is  V. 

The  variance  of  U  can  be  expressed  as  follows: 

Var(U)  =  E((U  -  E(V))2) 

=  Var  (V)  +  E((U  -  V)2)  +  2E  ( (U  -  V)  (V  -  E(v))) 

The  assertion  that  Var  (U)  >  var  (V)  will  twe  established  as  soon  as  it  is 

shown  that  the  cross  product  term  vanishes.  So,  Consider 

E((U  -  V)  (V  -  E(V) ) )  E((U  -  V)  (V  -  E(V))|  T  =  t)  D  FT(f),  where 

FT(t)  is  the  distribution  function  of  T.  Now, 

E  (V  -  U  |  T  =  t)  =  E  ( V | T  =  t)  -E  (U|T  =  t) 

=  g(t)  -  g(t) 

=  0 

and 

E  ( (U  -  V)  (V  -  E(V) )  | T  =  t)  =  E  ( (U  -  V)  (g(t)  -  h ( 4> ) )  |  T  =  t) 

=  (g(t)  -  h(*))  E(U  -  V|T  =  t) 

=  0 

Thus  the  above  integral  vanishes,  and  Var  (u)  >  Var  (V).  The  variance 
of  V  is  actually  smaller  if  U  does  not  depend  on  the  data  through  the 
value  of  T  only,  and  so  one  can  do  better  using  V  than  using  U.  Clearly, 
any  estimator  that  is  unbiased  and  has  a  scalier  variance  than  does  g(T) 
would  also  have  to  be  a  function  of  the  sufficient  statistic  T  (since 


otherwise  the  preceding  technique  would  yield  a  function  of  T  that  does 
at  least  as  well).  But  if  there  is  such  a  function,  K(T),  also  unbiased 
in  estimating  h(#),  then 

EK(T)  =  h(*) 

i 

=  Eg(T) 

for  all  $.  Frequently  the  family  of  densities  for  T  has  the  property 
of  completeness,  which  says  that  if 


CD 

i 


K(t) 


dF  (t)  = 
T 


-L 


d(t)dFT(t) 


€? 


for  all  then  K(t)  is  essentially  the  same  function  as  g(t).  In  this 
event  g(T)  is  actually  an  unbiased  estimate  of  h($)  with  minimim  variance. 

Thus,  although  maximum  likelihood  estimates  are  known  to  be  consist¬ 
ent,  asymptotically  efficient,  and  asymptotically  normal,  there  are  usu¬ 
ally  other  estimates  that  have  these  properties  and  which  would  then 
appear  to  serve  just  as  well  for  large  samples  (they  might  even  be  bet¬ 
ter  for  small  samples).  Such  estimates  are  called  best  asymptotically 
normal,  or  BAN,  and  can  be  obtained  in  various  ways. 

One  class  of  BAN  estimates  consists  of  certain  "Minimum  Chi-square" 

estimates,  defined  as  follows:  Consider  a  sample  X, , - ,  X  ,  from  a 

l  n 

vector  valued  population  X  with  mean  vector  y(4)  and  covariance  matrix' 
**(♦),  4  being  the  parameter  to  be  estimated  (it  could  be  multidimensional) 
The  quadratic  expression 

'X2  =  i  (X  -  uU))1  [MU)1_1  (X  -  y(*)) 

is  minimized  as  a  function  of  4  for  given  X^ - ,  Xn<  The  minimizing 

value  #  (Xj^, - ,Xn)  is  called  minimum  Chi-square  estimate  of  4.  It  is 

known  to  be  BAN  when  X  has  a  distribution  belonging  to  the  exponential 
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Ill,  Robust  Procedures 


General 

A  mathematical  model  is  basically  based  upon  a  set  of  assumptions. 
These  assumptions  are  not  supposed  to  be  exactly  true  -  they  are  math¬ 
ematically  convenient  rationalizations  of  an  often  fuzzy  knowledge  or 
belief.  These  rationalizations  or  simplifications  are  vital,  and  one 
justifies  their  use  by  appealing  to  a  vague  continuity  or  stability 
principle.  This  principle  states  that  "A  minor  error  in  the  mathemati¬ 
cal  model  should  cause  only  a  small  error  in  the  final  conclusions. 

A  statistical  inference  model  being  a  branch  of  the  mathematical 
model  should  be  consistent  with  the  stated  principle  for  a  mathematical 
model.  In  the  simplest  cases  there  are  implicit  and  explicit  assump¬ 
tions  about  randomness  and  independence,  about  distributional  models, 
perhaps  prior  distributions  for  some  unknown  parameters  and  so  on. 

During  the  last  decade  a  "robust"  procedures  have  been  introduced 
to  solve  the  conflict  between  the  model  assumptions  and  the  real  system 
being  studies  to  get  insensitivity  to  small  deviations  from  assumptions. 
Basically,  we  consider  the  distributional  robustness  which  means  that  the 
true  underlying  distribution  deviates  slightly  from  the  assumed  model 
(usually  the  Gaussian  law). 

As  an  example  for  that  Tukey  (Ref:  78  )  introduced  a  case  of  a 

contaminated  normal  distribution  with  contamination  factor  c  from  two 

2  2 

normal  distributions  N(ji,o  )  and  N(y,9a  ).  So  the  observations  Xj  will 
be  independent,  identically  distributed  with  common  underlying  distribu¬ 
tion  F(x)  where: 

F(x)  =  (1  -  z)  *  (^)  +  e 
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4>(x)  = 


l 


J* e  ^  2  dy  is  N(0,1) 


I 

i 

! 


|  O 

i 


i 


S 


Two  measures  of  scatter  are  the  mean  absolute  deviation 

dn  =  -  £  j  X.  -X  | 
n  1  1  1 


and  the  mean  square  deviation. 


S  =  £  (X.  -  x)2r* 

n  n  i 

These  two  measures  indicate  different  characteristics  of  the  error 
distribution.  The  performance  of  these  two  measures  is  summarized  by 
Huber  (Ref:46  )  according  to  their  asymptotically  relative  efficiency 

(ARE)  of  Sn  relative  to  dn  versus  the  contamination  factor  given  in  the 
following  table. 


ARE  U)  -  Lt  -  XH  (Sn)|(E  (5n)2 

ARE  is)  ~  Lt  -  Var  (d  ) J (E  (d 

n  -*■  ®  n  n 


e 

ARE  (e) 

0 

0.876 

0.001 

0.948 

0.002 

1.016 

0.005 

1.198 

0.01 

1.439 

0.02 

1.752 

0.05 

2.035 

0.10 

1.903 

0.15 

1.689 

0.25 

1.371 

0.5 

1.017 

1.0 

0.876 
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From  this  Huber  concluded  that: 


1.  The  above  does  not  imply  that  we  advocate  the  use  of  the  mean 
absolute  deviation  (There  are  still  better  estimates  of  scale). 

2.  The  contaminating  observations  could  be  considered  as  outliers 
and  on  treating  them  one  can  get  a  better  estimate  of  the  mean  square 
error . 

Till  this  point  it  seems  reasonably  to  clear  the  data  by  rejecting 
the  outliers  and  then  using  classical  estimation  and  testing  procedures 
for  the  remainder  one  can  end  with  a  better  estimating  model.  In  reality 
this  approach  faces  three  basic  pitfalls  in  application: 

1)  It  is  difficult  to  identify  the  real  outliers  unless  one  uses  a 
robus  estimating  model  (case  multiple  linear  regression). 

2)  Even  if  the  original  set  of  observations  consists  of  normal  with 
some  gross  erros,  the  cleaned  data  will  not  be  normal,  and  the  situation 
is  even  worse  with  a  non-normal  distribution. 

3)  As  an  empirical  fact  the  best  rejection  procedure  do  not  quite 
reach  the  performance  of  the  best  robust  procedure.  Because  robust  pro¬ 
cedures  make  a  smooth  transition  between  full  acceptance  and  full  rejec¬ 
tion  of  an  observation. 

Thus  a  robust  procedure  should  have  the  following  features: 

1)  It  should  have  a  reasonably  good  (optimal  or  near  optimal) 
efficiency  at  the  assumed  model. 

2)  Small  deviations  from  the  model  assumptions  should  affect  the 
model  performance  only  slightly. 

3)  Relatively  larger  deviations  ^om  the  model  should  not  complete¬ 
ly  spoil  the  behavior  of  the  model. 


Basic  Types  of  Robust  Estimators 

The  basic  types  of  robust  estimators  are 

1)  M-Estimator 

(The  maximum  likelihood  tpes  estimates) 

2)  L-Estimaotr 

(The  linear  combinations  of  order  statistic  estimator). 

3)  R-Estimator 

(The  estimator  derived  from  r  and  k  tests) 

1 .  The  M-Estimator 

This  kind  of  estimates  is  the  most  flexible  one,  and  it  generalizes 
straight  forwardly  to  multiparameter  problems,  even  though  (  or,  perhaps 
because)  it  is  not  automatically  scale  invariant  and  has  to  be  supple¬ 
mented  for  practical  applications  by  an  auxiliary  estimate  of  scale. 

Definition:  Any  estimate  T  ,  defined  by  a  minimization  problem  of  the 

.  n 

form 

Z  p  (X. ,Tn)  =  min 
^  \ 

or  by  an  implicit  equation 

1  p  <x-i»  TJ  =  0 

a  ♦  x  n 

i.e. 

Z  *  <X.,Tn)  =  0 

a 

Where  ♦  (X. ,T  )  =  — —  p(X. ,T  )  is  called  an  M-estimate.  (This  estimate 
x  n  3  y  i  n 

is  the  orginary  M.L.E.  if 

p  (X; $)  =  — logf  (X; 4») 

In  the  linear  model  we  have 

y  =  XB  +  e 

and  we  are  interested  in  the  expected  value  of  the  response 
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E  (y)  =  E(X  fi  +  c) 

=  E(X  6)+  E(e) 

=  XE(6)  +  E  (c) 


So  in  case  of  E(e)  =  0,  we  get 

E(y)  =  XE(S) 


i,e.  we  basically  will  be  interested  in  the  location  parameter.  Thus 
assuming 

(X.  -  Tn)  =  P  (X.  -  Tn),  then 
Z  0(X.  -  Tn)  =  min 
or 


Assuming 


Z  <KX.  -  Tn)  =  0 


*(X.  -  T  ) 
r  l  n 


then 


Z  WA  (Xi  -  Tn)  =  0 

IX.W. 

Tn  l 

Where  the  weights  are  dependent  on  the  sample. 

For  the  functional  form  of 

I  fi  (Xj!  Tn)  =  0 

if  it  is  not  possible  to  generally  define  T(F)  to  be  a  value  of  t  which 
minimizes 

J*p  (X  ;t)  F(dx) 

For  example  the  median  corresponds  to 


p  (X;t)  =  |i-t|  while 


identically  in  t  unless  F  has  a  finite  first  absolute  moment.  A  simple 
solution  to  that  is  obtained  by  replacing  p(X;t)  by  p(X;t)-  p(X;tQ) 
for  some  fixed  to  i.e.  in  case  of  the  median  minimize 

f(  | x-t I - 1 x | )  F  (dx) 

In  a  similar  way  the  functional  form  of  <b(X^,t)  is 

/*  (X.;T(F))F(dx)=0, 

This  form  of  i}i(X,t)  does  not  suffer  from  the  previous  difficulty,  but 
it  might  have  more  solutions  corresponding  to  local  minima. 


Influence  Function  of  M-Estimates 

The  influence  function  describes  the  effect  of  adding  one  more 

observation  with  value  x  to  a  very  large  sample  on  the  value  of  an 

estimate  or  test  statistic  T(F  )  where  F  is  the  empirical  distribution 

n  n 


function. 

In  case  of  M-Estimates  the  influence  function  was  found  to  be  pro¬ 
portional  to  \|i  and  given  as 

lC(x,F,T)  . 


*(x*t(f)) 


and  in  case  if  <|»(XjO)  =  \|>(x  -  0)  we  obtain 


(~)  ♦  (X.T(F)  F(dx) 

d0  i 


!C(X  F,T) 


»(X  -  T(F) ) 

7*  [X  -  T(F)]F(dx) 


2.  The  L-Estimates 

Consider  a  statistic  that  is  a  linear  combination  of  order  statistics 

or  more  generally,  of  some  function  h  of  them: 

n 


T=  I  a  .  h(X, . , ) 
n  .  .  ni  (  i  ) 
1=1 


We  assume  that  the  weights  are  generated  by  a  (signed)  measure  M  on  (u,l) 


interval : 


=  H<¥-  i>} 


(This  choice  of  the  weights  preserves  the  total  mass,  la^  =M{(0,1)}, 

and  symmetry  of  the  coefficients,  if  Mis  symmetric  about  t  =  -i) 

Then  T  =  T(F  )  derives  from  the  functional  T(F )  =/  h  (F  1(s))M(ds) 
n  n  J 

and  this  gives  exact  equality  Tn  =  T(Fn)  if  the  integral  is  regularized 
at  its  discontinuity  points  and  will  be  equal  to 

i  h  (F-1(s  -  0)  +  i  h(F“1  (S  +  0)), 
i  n  z  n 

where  the  inverse  of  any  distribution  function  F  is  defined  in  the  usual 


way  as 


F-i(s)  =  inf  {  x  | F(x)  >  s} 


0  <  s  <  1 


Influence  Function  of  L-Estimates 

In  a  similar  way  like  that  for  the  M-Estimate  we  can  find  the 
influence  function  of  Ts  where  Ts  =  F^ts) 


IC(X!F,Ts)  =  - ,  for  X<F  1(s) 

nf~ L(s) ) 


f  (F_1(s) ) 


,  for  X>F  1 ( s ) 


It  is  worthwhile  to  note  here  that  the  influence  function  has  a  value 
only  if  F  has  a  non-zero  finite  derivative  f  at  F  ^(s). 

Using  the  chain  rule  for  differentiation,  the  influence  function  of 


h(Tg)  is 


IC(X,F,h(T  ))  -  IC(X,F,T  )  h%(Ts) 


and  thus  the  influence  function  of  the  estimator  T  itself  will  be 


IC(X,F,T)  =  jlcfXjF.MT,))  M  (ds) 


17 


3 .  R-Estimates 


R  estimation  is  a  procedure  based  on  ranks.  To  illustrate  the 

general  procedure,  consider  replacing  one  factor  in  the  least  squares 

n  1  2 

objective  function  (  %  (Y.  -  X.  B)  )  by  its  rank:  Thus  if  R.  is  the 

i=l  1  1  1 

.  n 

rank  of  Y.  -  X.  B,  then  we  wish  to  minimize  Z  (Y.  -  X.  B)R.  (Ref  1:894) 
ii  ..ill 

i=l 

Now  consider  a  two  sample  rank  test  for  shift:  let  X. ,  - ,X  and 

1  m 

Y^,  - ,  Y^  be  two  independent  samples  from  the  distributions  F(x)  and 

G(x)  =  F (X  -A),  respectively  merge  the  two  samples  into  one  of  size 
m  +  n  and  let  R^  be  the  rank  of  X^  in  the  combined  sample.  Let  a^  =  a(i), 
l<i<m  +  n,  be  some  given  scores;  then  base  a  test  of  A=  0  against 
A>0  on  the  test  statistic 


S 


m,n 


1 

m 


m 

Z  a(R. ) 
i=l 


Usually,  we  assume  that  the  scores  a^  are  generated  by  some  function  J  \ 
as  follows 

.. .  4  i  \ 

In  case  of  the  Wilcoxon  test,  J(t)  =  t  -  ~ 

6  • 

Estimates  of  shift  A  and  of  location  T  can  be  derived  from  such 

n  n 

rank  test: 

(1)  In  the  two  sample  cases,  adjust  An  such  that  0  when  computed 

from  (X^ - ,Xn)  and  (Yx  -  An» - ,  Yn  -  An) . 

(2)  In  the  one  sample  case,  adjust  Tn  such  that  n=  0  when  computed 

from  (X. , - ,X  )  and  (2T  -  X. ,  -  ,  2T  -  X  ).  So  a  mirror  image 

I  n  nl  ’nn  6 

of  the  first  sample  is  used  as  a  second  sample. 
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Influence  Function  of  R-estimates 

The  influence  function  in  this  case  is  given  as 
T ntv  r-  _  U(x)  -  fu(x)f(x)dx 


IC(X,F,T)  = 


J*U  (x)f(x)dx 


where 


vj 


U(x)  -f’\  |  [F(x)  +  1  -  F  (2T(F )  -  X)] j. 


•  f ( 2T(F )  -  x)  dx 

For  symmetric  F  this  can  be  simplified,  since  U(x)  =  J(F(x)),  then 

r ntv  T?  _  J  (F(x) ) 


IC(X,F,T)  = 


/ JV (F(x) ) )f (x)2dx 


IV.  Multiple  Linear  Regression 


A  regression  model  that  involves  more  than  one  regressor  variable 
is  called  a  multiple  regression  model.  Here  we  are  going  to  discuss 
the  fit  and  analysis  of  this  model  and  some  lightspot  on  the  measures  of 
adequacy  that  are  useful  in  multiple  regression. 


Multiple  Regression  and  Least  Squares 

Suppose  that  we  have  a  certain  response  y  which  may  be  related  to 

K  regressor  variables  by  the  model 

y  =  Bq  +  B^X^  +  8  2^2  + - +  Bj^k  +  e 

This  model  is  called  a  multiple  linear  regression  model  with  k-regressors . 

The  parameters  B-»  j  =  0,1, — ,k  are  called  the  regression  coefficients. 

J 

This  model  describes  a  hyperplane  in  the  k-dimensional  space  of  the 

regressor  variables  X..  The  parameter  B  ■  represents  the  expected  change 

J  J 

in  the  response  y  per  unit  change  in  X.  when  all  the  remaining  regressor 

J 

variables  X^  (i  4  j)  are  held  constant.  For  this  reason  the  parameters 
Bj,  j  =  1,2, - ,k  are  often  called  partial  regression  coefficients. 

Multiple  linear  regression  models  are  often  used  as  approximating 
functions.  That  is,  the  true  functional  relationship  between  y  and 
Xr  Xg, - tX^  is  unknown,  but  over  certain  ranges  of  the  regressor 

variables  the  linear  regression  model  is  an  adequate  approximation. 

Models  that  are  more  complex  in  structure  may  often  still  be  analy¬ 
zed  by  multiple  linear  regression  techniques.  For  instance  the  poly¬ 
nomial  model  of  degree  k  in  one  variable  which  has  the  form: 

k  o  V1 

y  =  £  Bj  x 

i=0 
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can  be  easily  modeled  by  using  the  substitution 


X.  =  X.  and  B.  =8.  with  X  =  1 
1  J  i  J  o 


Thus  the  model  will  be  the  original  linear  model 

y  =  6  +  6.X  +  —  +  8,  X. 

oil  k  k 

Similar  transformations  could  transform  the  model  under  consideration 
into  the  general  form  of  the  linear  model,  keeping  in  mind  that  the 
linearity  of  the  model  means  linearity  in  the  6  coefficients  and  not  in 
the  independent  variables. 


So,  the  basic  idea  behind  multiple  linear  regression  model  is  to 
find  a  linear  relation  that  can  adequately  approximate  am  unknown  rela¬ 
tion  between  a  set  of  independent  variables  (k  independent  variables) 

and  a  certain  respons  y. 

Mathematical  Model 

A  scientific  model  is  a  representation  of  some  subject  of  inquiry 
(such  as  objects,  events,  processes,  systems)  and  is  used  basically  for 
prediction  and  control.  This  scientific  model  is  basically  divided  into 
three  basic  types: 


1.  Iconic  model:  which  pictorially  or  visually  represents  certain  as¬ 
pects  of  a  system  (as  does  a  photograph  or  model  airplane). 

2.  Analogue  model:  which  employes  one  set  of  properties  to  represent 
some  other  set  of  properties  which  the  system  being  studied  possesses. 


3.  Mathematical  (or  symbolic  model):  which  employs  symbols  to  designate 
properties  of  the  system  under  study  (by  means  of  a  mathematical  equation 
or  set  of  equations). 

Consequently  the  mathematical  model  often  used  by  scientists  has 
three  main  types: 


1.  The  function  model, 


2.  The  control  model, 


3.  The  predictive  model 

1.  The  functional  model 

This  kind  of  model  exists  if  the  true  functional  relationship 
between  a  response  and  the  independent  variables  in  a  problem  is  known, 
so  the  response  could  be  easily  understood,  controlled,  and  predicted. 

i 

In  practice  there  are  few  cases  which  can  be  easily  modeled  by  a  function¬ 
al  model.  Even  though  those  models  turn  to  be  very  complicated,  difficult 
to  interpret  and  usually  of  nonlinear  form.  In  this  kind  of  models,  the 
the  linear  regression  procedure  do  not  apply  or  else  linear  models  can 
be  used  only  as  approximations  to  the  correct  models  in  itterative 
estimation  procedures . 

2.  The  control  model 

Even  if  it  is  known  completely,  the  functional  model  is  not  always 
suitable  for  controlling  a  response  variable.  For  example  if  the  model 
contains  the  ambient  temperature  as  an  independent  variable  in  the  model, 
this  temperature  is  not  controllable  in  the  sense  that  other  variables 
in  the  model  are  controllable.  Thus  a  model  which  contains  variables 
under  the  control  of  the  experimenter  is  essential  for  control  of  a 
response . 

A  useful  control  model  can  sometimes  be  constructed  by  multiple 
regression  techniques,  but  they  should  be  used  carefully  because  they 
are  very  dangerous  if  improperly  used  or  interpreted. 

3.  The  predictive  model 

When  the  functional  model  is  very  complex  and  when  the  ability  to 
obtain  independent  estimates  of  the  effects  of  the  control  variables  is 
limited,  one  can  often  obtain  a  linear  predictive  model  which,  though 
it  may  be  some  senses  unrealistic,  at  least  reproduces  the  main  features 
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of  the  behavior  of  the  response  under  study.  This  type  of  model  is  very 
useful  and  under  certain  conditions  can  lead  to  real  insight  into  the 
process  or  the  system  under  study.  It  is  in  the  construction  of  this 
type  of  predictive  model  that  multiple  regression  techniques  have  their 
greatest  contribution  to  make.  These  problems  are  usually  referred  to 
as  "problems  with  messy  data".  That  is,  data  in  which  much  intercorrela¬ 
tion  exists.  The  predictive  model  is  not  necessarily  functional  and 

need  not  be  useful  for  control  purposes.  This,  of  course,  does  not  make 
it  useless.  If  nothing  else,  it  can  and  does  provide  guidelines  for 

further  experimentation,  it  pinpoints  important  variables,  and  it  is  a 

very  useful  variable  screening  device. 

Non-Normal  Error  Distribution 


1.  Consequences  of  Non-normal  Disturbances 

Here  I'll  discuss  the  violation  of  the  normality  assumption  of  the 
error  term  in  the  regression  model : 

y  =  X  0  +  e 

The  discussion  will  be  made  in  two  phases,  according  if  the  variance 
of  the  error  has  a  finite  or  infinite  variance. 


a.  Finite  Variance  Case 

In  this  case  the  basic  definitions  and  assumptions  of  the  model  are 
exactly  the  same  i.e. 

1)  Y  is  called  the  response,  0  is  the  vector  of  coefficient,  x  is 
the  independent  variables  matrix,  and  e  is  the  error  term. 

2)  X  is  nonstochastic  of  rank  (p) 

3)  The  Lt  N  *XVX  is  a  finite  nonsingular  matrix,  and 

N  ♦  « 

4)  The  random  vector  e  is  such  that 

E  ( c )  =  0  and 

2  2 

E  (e  el  =  o  It  o  is  finite. 


Furthermore  if  £  is  normally  distributed 

»  _i  • 

1)  The  L.S.  estimator  b  =  (x  x)  X  y  is  unbiased  minimum  variance 
among  the  class  of  unbiased  estimators,  assymptotically  efficient  and 
consistent. 

2)  The  variance  estimator 

2  * 

r  =  (y  -  xb)  (y  -  xb)/(N-P)  is  best  quadratic  unbiased,  i.e. 

2 

it  has  minimum  variance  of  all  estimators  of  a  that  are  unbiased  and 
quadratic  in  y,  in  addition  it  is  asymptotically  efficient  and  consistent. 

3)  b*s normally ^ 

(N-P)  ^/a2-X2(N_p) 
and  they  are  independent 

4)  The  F-test  (for  Rg=r)  and  t-test  (for  the  individual  coefficients) 
are  valid  in  finite  samples. 

On  the  other  hand  if  e  is  not  normally  distributed,  we  shall  have: 

1)  b  is  unbiased  minimum  variance  among  the  class  of  linear  un¬ 
biased  estimators,  and  consistent. 

a  2 

2)  a  is  unbiased  and  consistent. 

3)  b  and  S  are  no  longer  efficient  or  asymptotically  efficient. 

If  the  form  of  error  distribution  is  known,  we  can  use  the  likelihood 

2 

function  of  y  to  estimate  g  and  <j  .  In  this  case  the  estimator  for  g 
will  be  nonlinear  in  general  and,  under  appropriate  regularity  conditions 
g,  a  will  be  asymptotically  efficient.  Otherwise  it  is  better  to  use 
nonlinear  robust  estimators. 

A  2  2 

4)  b  will  not  be  normal  and  (N-P)  a  /  2  also  will  not  be  v  .  This 

a  A 

means  that  the  F-  and  t-test  for  g  are  not  necessarily  valid  in  finite 
samples . 

b.  Infinite  Variance  Case 

In  this  case  the  error  distribution  has  an  infinite  variance.  As 
an  example  for  this  case  take  the  Pareto  distribution 


f  (e)  =  C(c  -  e  )  °  1,C,  e  ,a  are  constants 
o  o 

For  o>  2  the  variance  does  not  exist. 

Due  to  the  fact  that  infinite  variance  distribution  has  "thicktails", 
so  outliers  will  frequently  occur.  As  an  implementation  of  these  out¬ 
liers  the  L.s  technique  will  no  longer  lead  to  sensitive  estimation  of  B 

i.e.  8  will  considerably  vary  in  repeated  samples.  Also,  it  will  be 

2 

impossible  to  get  a  meaningful  estimate  for  a  and  8  will  no  longer  have 
the  minimum  variance  property  which  in  addition  means  that  F-  and  t- 
test  will  be  misleading. 

Malinvaud  (Ref  58:308)  mentioned  that,  in  practice,  one  can  asstime 
that  the  error  distribution  is  bounded  and  this  will  lead  to  a  finite 
variance.  However  this  will  not  solve  the  problem  and  in  case  of  rela- 

<s2 

tively  large  number  of  outliers  o  will  be  unstable  in  repeated  samples 

and  the  estimates  will  behave  as  if  the  variance  is  infinite. 

The  Double  Exponential  and  Technique  to  Estimate  B  Coefficients 


To  demonstrate  why  it  may  be  desirable  to  use  an  alternative  to 
least  square  when  the  observations  are  double  exponential,  consider 
the  simple  linear  model 


yi  =  *0  +BiXi  +  ei»  1  =  1*2» - >n 

Where  the  error  terms  are  *independent  random  variables  that  follow  the 
double  exponential  distribution. 


)  =  TT  e”lci  l/c 


2o 


,  —co  <e  ^  <«o 


•The  double  exponential  distribution  is  more  pointed  in  the  middle 
than  the  normal  and  tails  go  to  zero  as  | |  goes  to  infinity.  How¬ 


ever,  since  the  density  function  goes  to  zero  as  e1  t  1  goes  to  zero, 
and  the  normal  density  function  goes  to  zero  as  e-Ei  goes  to  zero,  so 
the  double  exponential  distribution  has  heavier  tails  than  the  normal. 
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Here,  we  shall  use  the  method  of  maximum  likelihood  to  estimate  8C 

and  6  .  The  likelihood  function  is 

L(6  .8,  )  =  it  e  ^£i*/o 

o  1  .=1  2a 


i=l  |ei'/c 


So  to  maximize  L (  ^he  same  as  maximizing  the  exponent  -  \\fa 

.  ...  r\  .  . 

or  minimizing  * ci I ,  the  sum  of  the  absolute  errors.  Knowing  that  the 

method  of  maximum  likelihood  applied  to  the  regression  model  with  normal 

errors  leads  to  the  least  squares  criterion.  Thus  the  assumption  of  an 
error  distribution  with  heavier  tails  than  the  normal  implies  that  the 

method  of  least  squares  is  no  longer  an  optimal  estimation  technique. 

However  the  absolute  error  criterion  would  weight  outliers  far  less  than 

2. 

would  least  squares  (  is  much  greater  than  |& |  in  case  of  outliers). 

Minimizing  the  sura  of  the  absolute  errors  is  often  called  the  L^norm 
regression  problem.  The  least  squares  is  the  L2~norm  regressing  problem. 

The  L^-norm  regression  problem  can  be  formulated  as  a  linear  pro¬ 
gramming  (LP)  problem. 

Now  let  X^j,i  =  1,2,  -  ,n,  and  j  =  1,2,  -  ,K  denote  the  set 

of  n  observational  measurements  on  k  independent  variables,  and  y^, 
i  =1,  -  ,n,  denote  the  associated  measurement  on  the  dependent  var¬ 

iable  (response).  The  technique  wishes  to  find  the  regression  co¬ 
efficient  8  .  that: 

J 

Minimize  ?  I  ?  X.  .  8  .  -  y. I 
i  J  1  j  J 

Chranes,  Cooper  and  Ferguson  (Ref  16  )  introduced  a  reduction  which  can 

transform  the  problem  into 

.  Z  t  Z  t 

Minimize  ,  .+ 

l  li  i  2i 


Subject  to 


@  .  is  unrestricted, 
J 


eli’  E2i  —  ° 

Where  e„-  is  the  vertical  deviation  above  the  fitted  line  and  e_.  is 

li  2i 

the  vertical  deviation  below  the  fitted  line  for  i^  observation.  Thus 

eli  *  e2i  a^s°lu^e  deviation  between  the  fit  ?  BQ  and 

y\.  By  the  nature  of  the  linear  programming  model,  e  ^  and  e cannot 

both  be  strictly  positive  in  an  optimal  solution.  So,  the  problem  is 

formulated  as  L.P  problem  of  the  form: 

Minimize  C.Z.  +  -  +  C,  Z. 

11  k  k 

subject  to 


Z1  “H  *  “  *  \  atk  (±  dl  if  *  *  N1 


I 


if  i  t  N2 


and 


>OVhcM. 


r- . ‘ 

^  Unrestricted  V  h  e 


f 

I 


where 


M1 ’  M2  *s  a  Par^^^^on^n8  linear  relations  (mutually  exclusive  and 

completely  exhaustive  partitioning)  and  similarly  N  ,  N  partions  for 

X  u 

the  set  of  the  variables. 

The  solution  of  the  model  in  our  case  will  be 


X  B  =  y 

Where  B  is  the  vector  (b^Bj* - ,Bk') 

It  worths  here  to  mention  that  if  the  number  of  observations  is 
large  enough,  the  present  model  will  be  somewhat  computationally  difficult 
and  it  will  be  better  to  use  the  dual  problem  for  determining  6- 


The  dual  model  is  still  large  since  it  contains  k  +  2n  relations. 


To  reduce  it,  let 


f .  =  D.  +  1  i  =  1,2, - ,n 

11 

and  the  dual  model  will  be  equivalent  to 

•  £  £ 

Maximize  .  y.f.  -  7  y. 

i  Ji  i  i  Ji 


Subject  to 


X.  . 


j  e  Mx 
j  t  m2 


0  <  <  2  i  =  1,2, - ,n 

Which  will  give  a  model  with  k  linear  relations  and  n  non-negative  bound¬ 
ed  variables.  This  final  model  could  be  solved  quite  rapidly  for  k 
( <10 )  by  simplex  algorithm  for  bounded  variables  problems.  On  solving 
this  model  we  can  determine  the  values  for  B. 

The  Uniform  Dist.  and  Minimax  Criterian 


Again  we  shall  consider  that  the  error  term  is  distributed  uniform- 
ally  with  mean  equal  to  zero  and  standard  deviation  equal  to  unity  i.e. 

^  ^~a  ).  Now  consider  also  the  case  of  a  simple  linear  model 

yi  =  B0  +  B1  Xi  +  Ei  ’  1  =  1»2» - »n 

where 

U  (-^3o  ,  ),  then 

f“i>  -  -p-/1  <v 

where 

I  (c. )  is  the  indicator  function. 

(-■^3o, 

The  maximum  likelihood  function  as  function  of  the  coefficients 


eQ,  81  is  given  as 


L(B  .6,)  = 

O  1 


/  1  (  T(£i)  \ 

U1 


n  I(e.) 

i=1  30 ,  \Zc)J 

This  function  will  achieve  its  maximum  when  the  difference  between  the 
first  and  last  order  statistic  will  be  minimum  i.e.  the  criterion  for 
obtaining  a  maximum  likelihood  estimators  for  and  B^  will  be  by  mini¬ 
mizing  the  difference  (e^  -  e(1)^  in  other  words  by  minimizing  the 
maximum  difference  of  (in  absolute  value)  or  equivalently  will  be  to 

Minimize  {maximum  |e^|} 

or  in  a  general  multiple  linear  model  will  be 

Minimize{  maximum  |E  X  B  _  | > 

i  J  ij  j  yi 

Paralleling  Kelley  transformed  this  piobiem  into  an  L.P  model: 

Minimize  6  ,  5>  0 

subject  to 

-  6  <  ?  X.  .  B  .  -  y.  <  5,  i  =  1,2, - ,n 

-  J  !J  J  1  ~ 

Where  S  is  the  minimized  value  of  the  maximum  absol¬ 
ute  deviation  |  ^  X^.  (L  -  y^|.  Using  the  same  approach  as  in  the  case 

of  minimizing  the  sum  of  absolute  deviation  briefly  discussed  in  the 
double  exponential  case,  the  model  formulation  will  be: 

Minimize  6 

subject  to 


-  5  xij  8j  * 4  i  1  —  • n 


5  Vj  * 5  i  yi  1  -  ‘-2.— .» 
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A 


,  >  C  V  j  e  M. 

" 

'  unrestricted  V  j  e  M2 


«  >  0  , 

and  the  dual  formulation  will  be 

Maximize  -  S  d2i 


subject  to 


-  J  X  d  .  +  ?  X. .d  . 
i  Aij  li  i  ij  zi 


dii '  %i  -  0 


£ 

i 


d  .  <  1 
2i  - 


<  0  V  j  e 
=  o  vj  g  m2 


This  dual  model  is  a  regular  L.P  problem  in  k  +  1  relations  and  could 
be  solved  by  a  standard  simplex  algorithm.  If  d^  (d^)  ds  Positive  in 
the  optimal  solution  of  the  dual  problem,  then  the  maximum  deviation 
occurs  for  the  i^*1  point  and  this  point  will  lie  above  (below)  the  fitted 
line.  Thus  the  solution  of  this  L.P  model  will  give  the  value  of  0  as 
the  estimated  value  of  8  in  our  multiple  linear  regression  model. 
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V.  Results 


In  this  part  of  my  study  I'm  going  to  summarize  the  research  done 
to  find  some  .technique  that  could  handle  the  model  of  27  observation  in 
11  independent  variables. 

The  model  chosen  was  obtained  from  Multiple  Listing,  Vol.  87  for 
area  12  (Erie,  PA).  To  search  for  a  technique  that  will  handle  such 
types  of  multiple  linear  regression,  it  was  necessary  to  find  some  real 
hyperplane  (fit)  to  take  as  reference  for  how  good  the  assumed  technique 
is.  In  order  to  do  that  a  least  squares  regression  was  performed  for  the 
observed  Y  and  X.  The  coefficient  vector  B^  from  this  model  (L.S.)  was 
multiplied  by  the  X  matrix  after  being  augmented  by  a  vector  of  l's  to 
give  vector  which  was  considered  as  a  real  value  of  Y  which  gives 
an  exact  fit 

Yt  =  B4X 

The  values  of  the  matrix  X  and  the  vector  B^  are  shown  in  Table  I. 
Description  of  Methods 

The  basic  idea  that  was  used  at  the  very  beginning  of  the  study 
was  to  use  the  Q  -  statistic  introduced  by  Hogg  (Ref  40)  and  defined  as: 

Q  =  tU(.05)  -  L( .05) ]/[U( .5)  -  L( . 5) ] 
where  U  (B)  is  the  average  of  the  largest  ns  order  statistics  (fractional 
items  are  used  if  ns  is  not  an  integer)  and  where  L  (3)  has  a  similar 
definition  using  the  smallest  items.  The  Q  statistic  was  basically  used 
as  a  discriminator  for  the  error  distribution  tail  length.  The  reason 
for  choosing  Q  to  be  used  as  a  discriminator  was  due  to  its  convergence 
properties  which  are  much  better  than  those  of  the  Kurtosis,  since  Q  is 
a  ratio  of  two  linear  functions  of  order  statistic.  In  addition  it  is 
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easy  to  see  some  similarity  between  Q  and  the  following  measure  of  tail 
length  of  the  distribution  function  f: 

[F-1( .975)  -  F-1( .025)]/[F_1) .75)  -  F_1 ( . 25) ] 

The  next  step  was  to  choose  some  value  for  the  Q  statistic  upon 
which  it  will  be  possible  to  determine  the  tail  length  of  the  error 
distribution.  As  a  matter  of  fact  the  basic  idea  was  to  come  up  with 
a  comparative  study  between  the  three  known  regression  techniques 
discussed  earlier:  least  squares  (L,,),  Minimization  of  the  absolute 
deviation  (L, ),  and  Minimization  of  the  maximum  error  (L  ) .  As  it  was 
pointed  out  these  three  techniques  will  give  maximum  likelihood  estimators 
for  normal,  double  exponential  and  uniform  error  distributions.  Thus 
these  estimators  will  be  of  desirable  properties  expressing  the  unknown 
relation.  To  do  that  a  set  of  random  deviates  was  generated  from  the 
three  distributions  and  added  to  in  succession  to  give  a  new  value  of 
Y  which  is  considered  as  the  observed  value  for  Y^_  i.e., 

y  =  Yt  +  e 

Trying  different  values  for  Q  statistic  to  get  reasonable  bounds 
(Q  ,  Q  )  to  discriminate  the  tail  length  of  the  distribution,  it  turns 

L  U 

out  to  use  Q  =  2.21  and  Q  =  2.81  i.e.,  if  Q  <  2.21  then  we  can  say 

L  U  “ 

that  the  distribution  is  uniform,  if  2.21  <Q<  2.81  the  distribution  is 
normal,  while  if  Q  >  2.81  the  distribution  will  be  double  exponential. 

The  number  of  times  these  bounds  will  discriminate  the  distribution  for 
the  known  three  underlying  ones  and  for  monte  carl  of  size  1000  is  as 
follows: 
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1. 

For  Uniform: 

872 

127 

1 

times 

times 

time 

uniform 

normal 

D.E. 

2. 

For  Normal : 

135 

uniform 

619 

normal 

246 

D.E. 

3. 

For  D.E. : 

17 

uniform 

229 

normal 

754 

D.E. 

As  a  start  for  knowing  the  distribution  of  the  residuals  through 
the  use  of  the  Q-statistic,  a  linear  least  squares  fit  was  performed  for 
each  of  1000  different  cases  of  added  error  vector  from  the  three 
considered  distributions.  Addition  of  an  outlier  to  one  of  the  observa¬ 
tions  at  multiple  values  of  standard  deviations  is  also  considered  during 
the  start  of  the  search.  The  results  from  this  step  is  shown  in  Table 
A— 1  for  the  number  of  times  Q  will  discriminate  each  of  the  residuals 
distriubtion  when  the  underlying  distribution  is  known.  While  Table  B-l 
exhibits  the  average  error  sum  of  squares  which  is  defined  as: 

1000  2 
ESSav  =  I  (Yti  -  )  / 1000 

i=l 

for  the  different  cases  discussed  above.  The  steady  increase  in  the 
values  of  ESS  with  the  outlier  location  with  respect  to  the  real  line 
prevail  the  effect  of  the  so  called  leverage  point  effect  on  the  fit 
which  can  be  demonstrated  by  the  following  graph: 

i  * 


Using  the  residuals  from  L.S.  and  making  a  decision  on  using  L^,  L 2,  or 
according  to  the  Q  values  (Q^  =  2.21,  Oy  =  2.81)  is  shown  in  Table 
B-2.  It  is  clear  from  this  table  that  ESS  is  still  steadily  increasing 

•  3V 

since  the  Q  statistic  is  discriminating  the  residual  most  of  the  times 

(Table  A-l)  as  normal  due  to  the  previously  mentioned  effect  by  leverage 

points..  So  it  seemed  to  be  a  better  notion  to  use  instead  of  using 

L2*  The  way  how  Q  discriminates  the  distribution  for  this  case  is 

displayed  in  Table  A-2  and  ESSav  for  is  in  Table  B-3.  Using  the 

residuals  from  and  with  two  limits  again  for  Q  the  decision  was 

taken  for  the  choice  between  L  ,  L,  or  L,  .  Table  B-4  shows  ESS 

for  this  case.  It  seemed  to  be  a  reasonable  idea  to  use  only  one  limit 

for  Q  to  discriminate  between  thick  tail  (D.E.)  and  thin  tail  (uniform 

and  normal)  distributions  directly.  The  resulting  ESS  is  shown  in 

av 

Table  B-5  which  improves  the  values  of  ESSav. 

The  previous  approaches  for  taking  the  problem  led  to  the  notion 
of  using  one of  the  robust  iterative  techniques  for  handling  leverage 
points  and  as  a  result  will  give  what  could  be  called  as  robust  Q  that 
will  give  a  better  discrimination  for  the  distribution  without  being 
effected  by  the  outliers.  As  starting  step  for  this  approach  Huber's 
function  defined  by: 


Z) 

and  calculating  weight  matrix 


•I 


Z  if  1*1  1  2 

2  sign  (Z)  | Z |  >2 


W.  = 
10 


*RYi  -  v0)|sJ  if  vi  *  vc 


<Yi  -  Ws 
1 


if  Y.  =  X.  2 
l  l  8o 
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which  will  give  the  coefficient  vector  as 


e  =  (x'w  x)_1  x'w  Y 

o  o 


This  Huber’s  function  has  an  influence  function  which  will  get  rid  of 

the  effect  of  outlier  by  weighting  them  with  constant  weights.  The 

influence  function  for  this  case  is  as  hown  in  the  following  figure. 

+(*>  I 


-*  A  o  t 


The  iterative  technique  for  robust  regression  needs  an  initial  value  to 
start  iterations  with.  In  this  context  a  comparison  was  done  between 
using  L.S.  or  L  as  initial  estimation.  The  ESS  for  these  two  cases 
are  shown  in  Table  B-5  and  Table  B-6  respectively.  While  Table  A-3 
and  Table  A-4  show  the  number  of  times  Q  discriminates  each  distribution. 
In  this  case  only  two  iterations  were  used.  Using  the  residuals  from 
Huber,  ESS  is  calculated  again  and  displayed  in  Table  B-7.  Till  this 

wV 

point  an  improvement  in  the  values  for  ESSaV  for  outliers  at  more  than 
100  S.D.  is  achieved  over  using  the  robust  technique  alone  but  still 
very  high  value  of  ESS  .  Trying  some  other  robust  techniques  we  ended 


with  using  Hampel  function  defined  as: 

iV(Z)  =  Z 


=  1.7  sign  (Z) 


,  |Z|  <  .7 
,  1 .7  <  |Z |  <  3.4 


*  -7  si6n-»)  <»•»-  1*1  >,  ,.4  <  p|  <  8.5 

=  0  ,  I Z I  >  8.5 
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which  has  influence  function  as  shown  in  the  following  figure 


The  ESS  from  Hampel  with  L.S.  and  L.  as  initial  estimation  is 
av  r  1 

shown  in  Table  B-8  and  Table  B-9  respectively.  In  Table  B-10  the 
resulting  ESSav  from  using  Hampel's  residual  is  shown. 

Coming  to  this  point  we  started  to  search  for  a  different  approach 
to  handle  our  problem.  This  search  basically  took  3-phases.  Each  phase 
is  based  on  using  the  residual  themselves  as  our  tool  to  make  the  decision 

a.  Phase  I: 

Using  the  residual  from  and  testing  if  its  greater  than  3  S.D. 

then  use  technique,  if  not  use  L2  (least  squares)  Table  B-ll  shows 

ESS  from  this  phase, 
av  r 

b.  Phase  II: 

In  this  phase  the  residual  from  Hampel  iterative  technique  was  used 
and  choice  of  technique  was  done  as  in  Phase  I.  Table  B-12  shows  the 
resulting  ESSav  from  this  phase. 

c.  Phase  III: 

This  is  really  a  different  approach  which  gives  the  nearest  fit  to 
the  real  line  throughout  our  study.  The  idea  is  to  perform  an  initial 
fit  and  by  replacement  of  all  points  that  are  more  than  3  S.D.  apart 
from  this  initial  fit  back  to  the  initial  line.  Then  by  redoing  the  fit 


a  lower  ESSav  could  be  easily  obtained.  Table  B-13  shows  the  resulting 

ESS  from  thise  phase, 
av  r 

Conclusions  ' 

1.  Presence  of  outlier's  mode  discrimination  of  distribution  outliers. 
(Tables  A-l  -  A-4)  difficult. 

2.  With  no  outlier's  Least  Squares  gave  the  best  fit. 

3.  Iterating  Robust  Estimators  resulted  in  no  improvement. 

4.  The  Hampel  Robust  Estimator  did  not  provide  outlier  protection. 

5.  The  technique  of  detecting  outlier  and  using  LI  if  it  is  greater 
than  3D  and  least  squares  otherwise  was  the  best  method  of  handling 
the  outliers  without  modifying  the  data  (B-12). 

6.  The  method  of  mapping  the  outlier  back  onto  the  regression  line  if 
res ; dual  is  greater  than  3SD  and  using  LI  gave  the  best  fit  using 
all  data  points. 

7.  Alternatively  the  best  fit  is  obtained  by  rejecting  the  points  whose 
residuals  are  greater  than  3SD  and  repeating  the  L.S.  fit.  See  0 


line  of  B-l . 


Table  1 


The  values  of  independent  variables  and  the  calculated  B  coefficient 


used  to  generate  the  real  line . 


h 

S 

^4 

fs 

h 

h 

y 

y 

y 

4.9176 

1.0 

3.4720 

0.9980 

1.0 

7 

4 

42 

i 

0 

5.0208 

1.0 

3.5310 

1 . 5000 

2.0 

7 

4 

62 

1 

i 

0 

4.5429 

2.0 

2.2750 

1.1750 

1.0 

6 

3 

40 

2 

i 

0 

4.5573 

1.0 

4.0500 

1.2320 

1.0 

6 

3 

54 

4 

i 

0 

5.0597 

1.0 

4.4550 

1.1210 

1.0 

6 

3 

42 

3 

i 

0 

3.8910 

1.0 

4.4550 

0.9880 

1.0 

6 

3 

56 

2 

i 

0 

5.8980 

1.0 

5.8500 

1.2400 

1.0 

7 

3 

51 

2 

i 

1 

5.6039 

1.0 

9 . 5200 

1.5010 

0.0 

6 

3 

32 

1 

i 

0 

15.4202 

2.5 

9.800 

3.420 

2.0 

10 

5 

42 

2 

i 

1 

14.4598 

2.5 

12.800 

3.0000 

2.0 

9 

5 

14 

4 

i 

1 

5.8282 

1.0 

6.4350 

1.2250 

2.0 

6 

3 

32 

1 

i 

0 

5.3003 

1.0 

4.9883 

1.5520 

1.0 

6 

3 

30 

1 

2 

0 

6.2712 

1.0 

5.5200 

0.9750 

1.0 

6 

2 

30 

1 

2 

0 

5.9592 

1.0 

6.6660 

1.1210 

2.0 

6 

3 

32 

2 

1 

0 

5.0500 

1.0 

5.0000 

1.0200 

0.0 

5 

2 

46 

4 

4 

1 

8.2464 

1.5 

5.1500 

1.6640 

2.0 

8 

4 

50 

4 

1 

0 

6.6969 

1.5 

6.9020 

1.4880 

1.5 

7 

3 

22 

1 

1 

1 

7.7841 

1.5 

7.1020 

1.3760 

1.0 

6 

3 

17 

2 

1 

0 

9.0384 

1.0 

7.8000 

1 . 5000 

1.5 

7 

3 

23 

3 

3 

0 

5.9894 

1.0 

5 . 5200 

1.2560 

2.0 

6 

3 

40 

4 

1 

1 

7.5422 

1.5 

4.0000 

1.6900 

1.0 

6 

3 

22 

1 

1 

0 

8.7951 

1.5 

9.8900 

1.8200 

2.0 

8 

4 

50 

1 

1 

1 

6.0931 

1.5 

6.7265 

1.6520 

1.0 

6 

3 

44 

4 

1 

0 

8.3607 

1.5 

9.1500 

1.7770 

2.0 

8 

4 

48 

1 

1 

1 

8.1400 

1.0 

8.0000 

1 . 5040 

2.0 

7 

3 

3 

1 

3 

0 

9.1416 

1.5 

7.3262 

1.8310 

1.5 

8 

4 

31 

4 

1 

0 

12.000 

1.5 

5.0000 

1 . 2000 

2.0 

6 

3 

30 

3 

1 

1 

The  b  coefficient  used  as  real  fit 

3.2621860 
.84373136 
8.2369984 
.25660890 
14.035590 
B  =  1.6223667 

-1.0604545 
-.32560404 
-.074490869 
.96740379 
1.0447037 
2.6899793 
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Number  Q  discriminates  the  tail  length  for  un i form normal ,  and  double 
exponential  error  distributions  after  performing  least  squares  fit. 


Uniform 

Underlying  Dist. 

Normal 

D.E. 

NS  .D 

UR 

N 

D.E 

UR 

N 

D.E 

UR 

N 

D.E 

0 

222 

633 

145 

126 

596 

278 

61 

450 

489 

1 

231 

623 

146 

131 

593 

276 

79 

435 

486 

, 

3 

247 

600 

153 

151 

'  584 

265 

98 

458 

444 

6 

283 

584 

133 

206 

608 

186 

162 

527 

311 

9 

319 

578 

103 

268 

606 

126 

234 

574 

192 

100 

8 

992 

0 

12 

988 

0 

9 

991 

0 

1000 

0 

1000 

0 

0 

1000 

0 

0 

1000 

0 

10000 

0 

1000 

0 

0 

1000 

0 

0 

1000 

0 

Table  A-2 


Number  of  times  Q  discriminates  the  tail  length  for  uniform,  normal  and 
double  exponential  error  distribution  after  performing  L^. 


Uniform 

Underlying  Dist. 

Normal 

D.E 

NS  .D 

UR 

N 

D.E 

UR 

N 

D.E 

UR 

N 

B 

0  , 

246 

595 

159 

136 

550 

314 

64 

446 

490 

1 

218 

554 

228 

115 

506 

379 

64 

437 

499 

3 

63 

327 

610 

27 

252 

721 

10 

139 

851 

6 

23 

94 

883 

14 

88 

898 

3 

73 

924 

9 

22 

93 

885 

15 

86 

899 

3 

71 

926 

100 

22 

93 

885 

15 

86 

899 

3 

70 

927 

1000 

22 

93 

885 

15 

86 

899 

3 

70 

927 

10000 

22 

93 

885 

15 

86 

899 

3 

70 

927 

_ 1 

Number  of  tim$s  Q  discriminates  the  tail  length  for  uniform,  normal 
and  double  exponential  error  distribution  after  performing  Huber  robust 
technique  with  L.S  as  initial  estimation. 


.  _  . 

Uniform 

Underlying  Dist. 

Normal 

D.E 

NS.D 

UR 

N 

D.E 

UR 

N 

D.E 

UR 

N 

D.E 

0 

217 

535 

248 

122 

476 

402 

58 

320 

622 

1 

223 

515 

262 

127 

461 

412 

75 

299 

626 

3 

239 

511 

250 

147 

470 

383 

98 

338 

564 

6 

274 

503 

223 

202 

500 

298 

160 

414 

426 

9 

309 

510 

181 

258 

515 

227 

222 

479 

299 

100 

3 

753 

244 

10 

722 

268 

4 

723 

273 

1000 

0 

671 

329 

0 

681 

319 

0 

656 

344 

10000 

0 

867 

133 

0 

869 

131 

0 

888 

112 

Table  A-4 


Number  of  times  Q  discriminates  the  tail  length  for  uniform,  normal 
and  double  exponential  error  distribution  after  performing  Huber  robust 
technique  with  as  initial  estimation  using  only  one  limit  Q  =  2.81. 


Uniform 

Underlying  Dist. 

Normal 

D 

NS  .D 

N 

D.E 

N 

B 

0  - 

752 

248 

598 

402 

1 

•738 

262 

588 

412 

3 

750 

250 

617 

383 

6 

777 

223 

702 

298 

9 

819 

181 

773 

227 

100 

756 

244 

732 

268 

1000 

671 

329 

681 

319 

10000 

867 

133 

869 

131 

N 

B 

378 

622 

374 

626 

436 

564 

574 

426 

701 

299 

727 

273 

656 

344 

888 

112 

Table  B-2 


ESSav  for  using  L.S  and  calculating  Q  from  its  residuals  and  choose 
between  L^,  L2  or  L1  according  to  the  value  of  Q  (Q2  =  2.21,  Qy  =  2 
with  throwing  an  outlier  at  N  S.D  (multiple  of  S.D) 


N 

'.'NT  FORM 

NORMAL 

D.E 

0 

14.86 

14.03 

13.03 

l 

15.33 

14.92 

14.40 

3 

22.83 

22.82 

21.95 

6 

46.58 

96.63 

45.35 

9 

84.17 

81.84 

80.15 

100 

63.91E2 

61.90E2 

61.44E2 

1000 

56.34E4 

57.17E4 

55.67E4 

10000 

72.78E6 

72.94E6 

74 . 54E6 

Table  B-3 


ESS  from  L.  with  an  outlier  at  N*S.D  (at  multiple  of  standard 
av  1  ' 

deviation)  for  monte  carlo  of  size  1000 


N 

UNIFORM 

NORMAL 

D.E 

0 

20.06 

17.22 

13-73 

1 

21.23 

18.35 

14.95 

3 

29.87 

26.98 

23.53 

6 

52.66 

48.53 

43.86 

9 

73.02 

67.68 

59.09 

16 

84.93 

80.51 

70.06 

100 

85.13 

81.09 

70.65 

1000 

85.13 

81.09 

70.65 

10000 


85.13 


81.09 


70.65 


Table  B-4 


ESSav  for  making  decision  according  to  Q  statistic  calculated  from  L 

residuals  and  using  Lw,  L.S  or  L  according  to  Q  value  (Q2  =  2,21m  Q 

r  11  ■  r  1  L  1 - - — :  — — l  .  » 


N 

UNIFORM 

NORMAL 

D.E 

0 

13.49 

13.67 

13.02 

1 

14.92 

14.89 

13.93 

3 

26.25 

25.08 

23.16 

6 

54.03 

50.05 

45.34 

9 

78.53 

72.72 

63.20 

100 

10.41E2 

91 .75E1 

67.68E1 

1000 

96.53E3 

84.76E3 

61.29E3 

10000 

84.77E5 

96.53E5 

61.27E5 

Table  B-4  (cont.) 

r  using  with  Q  for  making  the  decision  and  with  only 

=  2.81)  and  1 

ising  either  1 

,.S  or  . 

N 

UNIFORM 

NORMAL 

D.E 

0 

13.73 

13.46 

12.80 

1 

14.87 

14.49 

13.91 

. 

3 

21.92  _ 

21.69 

21.56 

6 

45.67 

44.40 

44.57 

9 

81.52 

79.42 

77.11 

100 

63.87E2 

61 .75E2 

61.38E2 

1000 

56.34E4 

57.17E4 

55.07E4 

10000 

71.78E6 

72.94E6 

74.54E6 
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Table  B-5 


ESS  from  Huber  with  an  outlier  at  N*S.D  (at  multiple  of  standard  devia 
av 

tion)  for  monte  carlo  of  size  1000  using  L.S  as  initial  estimation. 


1.  One  Iteration 


N 


0 

1 

3 

6 

9 

100 

1000 

10000 


UNIFORM 


12.28 

13.22 

20.16 

43.27 

81.52 

81.12E2 

78 . 98E4 

78.75E6 


NORMAL 


11.79 

12.60 

19.31 

42.05 

79.91 

80.78E2 

78.98E4 

78.75E6 


11.53 

12.44 

* 

19.43 
42.67 
80.88 
80 . 90E2 
78.97E4 
78.75E6 


N 

UNIFORM 

NORMAL 

D.E 

0 

12.37 

11.85  ' 

11.46 

1 

13.31 

12.67 

12.37 

3 

20.26 

19.40 

19.40 

6 

43.43 

42.18 

42.74 

9 

81.75 

80.14 

81.06 

100 

78.37E2 

77.82E2 

77.97E2 

1000 

74.48E4 

74.49E4 

74.47E4 

10000 

74.78E6 

72.94E6 

74.54E6 

47 


Table  B-7 

ESSav  for  the  decision  according  to  Q  calculated  from  the  residual  from 

Huber  with  L.S  as  initial  estimation  and  using  Q  =  2.21,  Q  =  2.81  i.e 

L  U 

using  L  ,  Lg  or  according  to  the  value  of  Q  (Adaptive) 


N 

UNIFORM 

NORMAL 

D.E 

0 

14.86 

14.03 

13.03 

1 

15.87 

15.04 

14.34 

3 

22.83 

22.82 

21.95 

6 

46.58 

96.63 

45.35 

9 

84.17 

81.84 

80.15 

100 

63.91E2 

61.90E2 

61.44E2 

1000 

56 . 34E4 

57.17E4 

55.07E4 

10000 

72.78E6 

72.94E6 

74.54E6 

Table  B-8 


ESSav  from  Hampel  with  an  outlier  at  N*S.D  (at  multiple  of  standard  devia¬ 
tion)  for  monte  carlo  of  size  1000  using  L.S  as  initial  estimation. 

1.  One  Iteration 


N 

UNIFORM 

NORMAL 

D.E 

0 

21.50 

11.86 

11.40 

1 

13.44 

12.68 

12.32 

3 

20.44 

19.49 

• 

19.41 

6 

43.68 

42.40 

42.85 

9 

82.01 

80.29 

81.02 

16 

21 ;86E1 

21.16E1 

20.50E1 

100 

77.78E2 

77.49E2 

77.53E2 

1000 

75.52E4 

75.56E4 

75.56E4 

10000 

75.39E6 

75.39E6 

75.39E6 

2.  Two  Iterations 


N 

UNIFORM 

NORMAL 

D.E 

0 

12.68 

12.01 

11.35 

1 

13.63 

12.84 

12.30 

3 

20.66 

19.68 

19.46 

6 

44.01 

42.70 

43.11 

9 

82.53 

80.76 

81.38 

16 

20.52E1 

19.17E1 

17.24E1 

100 

71.52E2 

70.97E2 

71.02E2 

1000 

67.70E4 

67.61E4 

78.60E4 

10000  • 

67.29E5 

67.29E5 

67.29E5 

50 


Table  B-9 


ESS  from  Hampel  with  an  outlier  at  NS.D  for  monte  carlo  of  size  1000 
av 

using  as  initial  estimation. 


1 .  One  iteration 


N 

UNIFORM 

NORMAL 

D.E 

0 

12.23 

11.71 

11.30 

1 

13.16 

12.53 

12.25 

3 

20.09 

19.30 

19.35 

6 

43.09 

41.91 

42.42 

9 

80.91 

78.87 

79.17 

16 

21.86E1 

21.16E1 

20.50E1 

10000 

75.39E6 

75.39E6 

75.39E6 

2.  Two  Iterations 


N 

•  UNIFORM 

NORMAL 

D.E 

0 

12.31 

11.80 

11.24 

1 

13.25 

12.61 

12.20 

3 

20.19 

19.43 

19.41 

6 

43.19 

41.98 

42.50 

9 

80.65 

78.06 

77.65 

16 

20.52E1 

19.17E1 

17.24E1 

100 

13.57E2 

24.45E2 

64.17E2 

10000 

67.29E6 

67 .30E6 

67.29E6 

Table  B-10 


ESS  from  using  L.  or  L„  after  making  decision  according  to  Q  calculated 
av  °  1  2 

from  residuals  of  Hampel  (L.S  as  initial  estimation) 


N 

UNIFORM 

NORMAL 

D.E 

0 

14.45 

13.85 

12.90 

1 

15.43 

14.88 

14.02 

3 

22.78 

22.25 

21.85 

6 

46.50 

44.94 

44.56 

9 

81.80 

79.11 

75.95 

100 

30.91E2 

28.81E2 

28.80E2 

1000 

85.13 

81.09 

70.65 

N 

UNIFORM 

NORMAL 

D.E 

0 

13.49 

13.67 

13.02 

1 

14.92 

14.89 

13.93 

3 

26.25 

25.08 

23.16 

6 

54.03 

50.05 

45.34 

9 

78.53 

72.72 

63.20 

100 

10.41E2 

91.75E1 

67.68E1 

1000 

96.53E3 

84.76E3 

61.29E3 

10000 

96.53E5 

84.77E5 

61.27E5 
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